Incorporating heterogeneous biological data sources in clustering gene expression data

نویسندگان

Gang-Guo Li

Zheng-Zhi Wang

چکیده

In this paper, a similarity measure between genes with protein-protein interactions is proposed. The chip-chip data are converted into the same form of gene expression data with pearson correlation as its similarity measure. On the basis of the similarity measures of proteinprotein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Performance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate heterogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Clustering Genes Using Heterogeneous Data Sources

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical ...

متن کامل

Modèles d'intégration de la connaissance pour la fouille des données d'expression des gènes. (Knowledge Integration Models for Mining Gene Expression Data)

In the framework of this thesis we develop new data mining models for knowledge discovery with gene expression proles. Data mining is the science of automatically extracting knowledge hidden in large data sets. Gene expression technologies are powerful methods for studying biological processes through a transcriptional point of view. These technologies have produced vast amounts of data by mea...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

MOTIVATION Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering. RESULTS To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a n...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Incorporating heterogeneous biological data sources in clustering gene expression data

نویسندگان

چکیده

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Clustering Genes Using Heterogeneous Data Sources

Modèles d'intégration de la connaissance pour la fouille des données d'expression des gènes. (Knowledge Integration Models for Mining Gene Expression Data)

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

عنوان ژورنال:

اشتراک گذاری